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Abstract 

In this paper, we proposed an effective approach for scheduling of multiprocessor unit time tasks with 
chain precedence on to large multiprocessor system. The proposed longest chain maximum processor schedul¬ 
ing algorithm is proved to be optimal for uniform chains and monotone (non-increasing/non-decreasing) 
chains for both splitable and non-splitable multiprocessor unit time tasks chain. Scheduling arbitrary chains 
of non-splitable multiprocessor unit time tasks is proved to be NP-complete problem. But scheduling arbi¬ 
trary chains of splitable multiprocessor unit time tasks is still an open problem to be proved whether it is 
NP-complete or can be solved in polynomial time. We have used three heuristics (a) maximum criticality 
first, (b) longest chain maximum criticality first and (c) longest chain maximum processor first for scheduling 
of arbitrary chains. Also compared performance of all three scheduling heuristics and found out that the 
proposed longest chain maximum processor first performs better in most of the cases. 


1 Introduction and Motivation 

Modern computing system contains multiple cores which enable many applications or tasks to execute concur¬ 
rently. Chip multiprocessor exploits the increasing device density in a single chip, so now a days we expects two 
or three order number of cores on a chip. Also most of the applications by nature they are parallel and their 
run time characteristics exhibit time varying phase behavior mm- Applications impose different performance 
metric values in different phases. During a phase an application have same value of performance metrics. In 
[3113], Banerjee et al. used instruction per cycle (IPC), instruction level parallelism (ILP) and LI cache hits 
to detect phases of applications execution time. As different phases of application have different parallelism 
and memory access characteristics scheduling algorithm should consider this fact to improve the performance, 
otherwise system will be underutilized. In this work, we have considered the parallelism characteristics of dif¬ 
ferent phases of an application to schedule efficiently. Without loss of generality, we can consider an application 
consist of a sequence of task or phase, where each task or phase exhibit different degree of parallelism. In this 
text, we have used task and phase interchangeably. Tasks that require more than one processor at a time are 
called multiprocessor tasks. Tasks requiring k processors at a time are called fc-width tasks. 

In this paper we are concerned about scheduling of N multi-phase applications (chain of multiprocessor tasks, 
each of which have arbitrary number of phases or tasks) onto M processor system. Each phase or task has 
two characteristics: one is execution time of phase and other is number of processors required for execution 
of that phase. A phase of an application can be scheduled on any subset of processors of given size (given by 
phase). So this kind of application can be represented as collection of multiprocessor task with chain precedence 
constraints. Interchangeably we say this as chain of multiprocessor task throughout this paper. If execution 
time of a multiprocessor task is 1 then we say it is a multiprocessor unit time task and for multiprocessor unit 
time task we don’t use pre-exemption. A task is splitable multiprocessor unit time task with k-width, means 
this task may be run in splitable in term of processor. Suppose a task require 1 unit time p processor then 
this task may execute 1 unit on d processor in current time slot and remaining p — d unit in any other time 
slots. If d is integer then task is unit splitable otherwise continuous splitable task. In this paper, we have tried 
to link theoretical aspect of multiprocessor scheduling and efficient scheduling multi-phase application on to 
multiprocessors to cope up with modern days execution environment scenario (parallel multi-phase application 
on large multi-processor system). 

Non-preemptive scheduling of N independent tasks (uni-processor task) on M > 3 processors is NP-complete. 
Similarly, the problem of multiprocessor tasks scheduling is NP-Hard for non-preemptive case and independent 
tasks of arbitrary execution time |1]. If the execution time of each phase is restricted to unit time then the 
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Figure 1: Multiprocessor tasks with chain precedence 


problem is polynomially solvable for arbitrary but fixed number of processor requirement case [3]. When 
preemption is allowed we can solve independent tasks of arbitrary execution time with arbitrary number of 
processors requirements in polynomial time [4]. Most of the cases, if we add precedence constraint between 
tasks difficulty increases. Scheduling multiprocessor applications which have precedence constraint as directed 
acyclic graphs is NP-Complete. Many heuristics are proposed to schedule them. We are considering precedence 
constraint chain in this paper which is simplest precedence constraint. The problem scheduling multiprocessor 
task with chain precedence is also strongly NP-Hard for than two processors and non pre-emptable tasks of 
arbitrary execution time [4] . If execution time is restricted to unit for chained applications then also the problem 
is strongly NP-Hard for three processors with arbitrary processors requirement of multiprocessor task [3] . 

In this paper we are considering three types of chains of multiprocessor unit time task. These are (a) uniform 
chain, (b) monotonically increasing or decreasing chains and (c) arbitrary chains. We are considering two types 
of multiprocessor tasks also. These are (a) splitable tasks, and (b) non-splitable tasks. Non-splitable task require 
all required processors simultaneously at a time means A:-width task can not be scheduled on < fc processors 
and splitable task can be processed by allocating partial number of processors at different times means fc-width 
task can be scheduled on processors less than fc at a time and remaining can be given on next time. Our result 
is stronger result as compared to Blazewicz et al. [in], where they consider scheduling of tasks requiring an 
arbitrary number of processors between 1 and k, where k is fixed integer and unit time processing. Also we 
have considered splitable and non-splitable version of the problem. 

Rest of the paper is organized as follows: We have described the problem formulation and variation of problem 
in Section [21 We have described previous work in Section [3] We have described our proposed algorithm for 
scheduling of uniform chains of splitable and non-splitable multiprocessor unit time tasks in Section|31 Similarly, 
we have described algorithm for scheduling of monotone chains of splitable and non-splitable multiprocessor unit 
time tasks in Section [SI Section [5] describes about compared three heuristics to solve the problem of scheduling 
arbitrary multiprocessor task chains and evaluates their performance on various scenarios for both splitable and 
non-splitable task chains. Finally, we have concluded about paper and pointed future works in Section [T] 


2 Problem Formulation 


2.1 Scheduling of multiprocessor tasks on M processor system with chain con¬ 
straint 

A collection of N application C = Ci,C 2 ,----,Cn has to be executed by M identical processors. Each application 
or chain consists of Ui phases or tasks where i G [1,2, ..N], The processors requirement of task Tij is Py, where 
Tij is phase or task of application Ci and it satisfies 1 < Py < M. Execution time tij of each tasks Tij may 
be arbitrary. Figure 1(a) shows an example of application system. In this example, we have 4 applications or 
chains (Ci, C 2 , C 3 and C 4 ). Application Ci, C 2 , C 3 and C 4 have 4 phases, 3 phases, 5 phases and 4 phases 
respectively. A task of an application can’t start execution before complete execution of its predecessor task of 
the same application. 

An optimization criterion of multiprocessor scheduling is minimizing makespan time Cmax- The makespan 
is defined as the total length of the schedule i.e. when all tasks of all applications are finished i.e. Cmax = 
max {Fi} where Fi is the finishing time of i*^chain or application. 

°^^f^^problem of scheduling of multiprocessors arbitrary time tasks with chains is NP-complete, so for simplicity 
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Figure 2: Multiprocessor unit time tasks with chain precedence 


we assume execution time of each tasks T^- equal to unit length i.e. ty = 1 where i = 1 to N and j = 1 to 
rii, in this paper. We also assume that there is no communication delay between tasks of an application and 
among the applications. In the paper, we have used n as total number of tasks (which is different from the 
number of phases of application Ci) and m or M as number of processor in the system. 


2.2 Considered types of multiprocessor tasks 

In this paper, we have assumed two types of tasks: (a) Non-splitable tasks i.e. task can only be processed when 
all the required number of processor by task are allocated to task at a time and (b) splitable tasks i.e. task 
can be processed by allocating all required number of processor in pieces at different time. Clearly, we can 
categorize the multiprocessor unit time tasks with chain precedence into three following cases: 


1. Uniform chains: All the tasks of a chain have same number of processors requirement and tasks of 
different chains may have different processors requirement. Example of this kind of task system is shown 
For a given chain, all the task have same number of parallelism or processor requirement, 


in Figure 2(a 


but this may be different for different chain. 

2. Monotone chains: All the tasks of a chain have non-increasing (or non-decreasing) processor require¬ 
ment. All the chains of task system are one type of chain either non-increasing or non-decreasing. Figure 
|2(b)| and 2(c) shows non-increasing and non-decreasing monotone chains of multiprocessor tasks respec¬ 
tively. Considered monotone task system does not contain mix of both non-increasing and non-decreasing 
chain. 


3. Arbitrary Chains: In this case, tasks of a chain have any arbitrary processors requirement in arbitrary 
order as shown in Figure [2(d)[ 

We have discussed scheduling approach of all three types of chains of multiprocessor tasks (uniform, monotone 
and arbitrary) and also with both types of multiprocessor tasks (splitable and non-splitable). So it becomes in 
total six different types of chain of multiprocessor task system. We know that lower bound (LB) of makespan 
time of scheduling chains of multiprocessor task on multiprocessor can be calculated as m 

E N 

2=1 2 -^j—i P'^j r 1A /i \ 

= max ^- ^max{ni}j (1) 

where pij is processor requirement of task of application, N is number of chain, M is number of processor 
and rii is number of phase/task of chain. Let us assume OPT is optimal makespan time produced by an 
optimal algorithm. So makespan time of any arbitrary algorithm will be LB -t- Pwaste and it will satisfy the 
following relation. 

LB < OPT < LB -|- Pwaste (2) 

where Pwaste is minimum average CPU time wastage. Average CPU time wastage is calculated as ratio of total 
CPU time wastage and total number of processor (M). CPU time wastage at any time slot is the number of 
number of free processor at that time unit. Wastage of CPU time happens because of these following reasons. 

1. Some processors may be free at one time slot because the remaining processors are not sufficient for any 
ready task (processor requirement of ready tasks is higher than the available free processors). 

2. Some processors may be free if the total requirement of all ready tasks is less than the total available 
processors at any one of the time slot. 
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Figure 3: Complexity of scheduling problems on P processors without communication time between tasks 


3 Previous Work 

3.1 Scheduling task on multiprocessor 

As described in Section 1, non-preemptive scheduling of independent tasks on m > 3 processors is NP-complete. 
Also, the problem of scheduling a finite set of tasks having some precedence constraint on finite set of multi¬ 
processor with goal of minimizing makespan is NP-complete for most of the cases except for a few simplified 
cases. Many heuristics with polynomial-time complexity have been suggested based on their assumptions about 
the structure of the parallel program and the target parallel architecture m- These assumptions includes (a) 
uniform task execution times, (b) zero inter-task communication times, (c) contention-free communication, (d) 
full connectivity of parallel processors, and (e) availability of unlimited number of processors. 

However these assumptions may not hold in real world for a number of reasons. Even after making above as¬ 
sumptions, scheduling problem is NP-complete in these following cases [T2] : (a) scheduling tasks with uniform 
weights to an arbitrary number of processors and (b) scheduling tasks with weights equal to one or two units 
to two processors. 

As stated in [12] there are only three special cases for which there exist optimal polynomial time algorithms. 
These cases are (a) scheduling tree-structured task graphs with uniform node weights on arbitrary number of 
processors in linear time by Hu’s |15j highest level first heuristics, (b) scheduling arbitrary task graphs with 
uniform node weights on two processors in quadratic time by Graham et. al. El, (c) scheduling an interval 
ordered task graph with uniform node weights to an arbitrary number of processors have been solved in linear 
time by Papadimitriou et. al. m- However, even in these cases, communication among tasks of the parallel 
program is assumed to take zero time. In [T7], Ullman proved that DAG scheduling problems where considered 
DAG’s nodes have unit weights and system has m processors are NP-complete. He also proved that DAG 
scheduling problem where nodes have either one or two as a weight value and system has two processors is also 
NP-complete. Figure [XT] shows complexity of scheduling problems without communication time between tasks 
in tabular form. 

3.2 Scheduling multiprocessor task on multiprocessor 

The problem of scheduling multiprocessor tasks on multiprocessor is even more harder. Blazewicz et al. |4|, 
proposed O(nlogn) time algorithm for chained multiprocessor tasks where processor requirement is uniform 
for each task in a chain, where n is number of task and m is number of processor. They proposed O(nlogn) 
algorithm for same type of applications with processor requirement in either increasing or decreasing fashion 
in a chain but having only two types of tasks either requiring 1 processor or k processor. Gonzalez et al. 
|3], proposed a polynomial time preemptive algorithm for scheduling trees in O(nlogm) time in off-line mode. 
Algorithm given by them solves forests of n tasks onto m identical processors by minimizing the number of 
preemption in worst case. Blazewicz et al. |7], proposed scheduling algorithm for independent processor tasks. 
They divided the tasks into two sets t — type sets and w — type sets, t-type tasks are those tasks which require 
one arbitrary processor for execution and w-type are those which require two arbitrary processors. The non 
preemptive version of scheduling t and w types tasks is NP-complete but in this preemptive version is given 
which schedules t and w tasks in 0(n log to) time. Blazewicz et al. |10j proposed linear time algorithm for 
scheduling tasks requiring an arbitrary number of processors between 1 and k, where k is hxed integer and unit 
time processing. If k is not hxed than problem is NP-complete. They considered both preemptive and non 
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Algorithm 1 Longest Chain Maximum Processor First (LCMPF) 

Input: Set of N Application Chains and M Processors. 

1: while All chains are not scheduled completely do 

2: Sort chains in the non-increasing order of remaining unscheduled chain length. 

3: while at least one processor is free do 

4: if there are two or more chains of same unscheduled length then 

5: Select next ready task Tij from that chain which has max. proc. req. (initial req.) in all chains of same length. 

6: else Select next ready task Tij from chain with longest unscheduled length. 

7: if Remaining processors m are more than pij of selected task Tij then 

8: Schedule task Tij on Allocated Processors 

9: else Schedule task Tij on remaining processors m and make this task as next ready task for current chain with proc. req. 

equal to pij-m. 
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Uniform chains of multiprocessor unit time task (b) Assuming some values: applications set at time 
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Figure 4: Example of uniform chains of multiprocessor unit time task 


preemptive versions for two types of problem sets. One is if there are only two types of tasks in the set, one 
requiring 1 processor and other requiring k processors then both non preemptive version and preemptive version 
take 0{n) time to schedule the tasks. Second is if tasks require arbitrary number of processors between 1 and k 
then non preemptive version gives complexity 0 {m^~^.n^) and preemptive version gives complexity of 
Li et al. [S], proposed a task duplication based scheduling algorithm for fork-join task graph with complexity 
of O(n^). Blazewicz et al. [5], proposed a scheduling algorithm for two-processors tasks on uniform 2-processor 
system and schedule w and t types tasks in 0 {n'm + nlogn) time. 


4 Uniform Chains of Multiprocessor Unit Time Tasks 


4.1 Splitable multiprocessor tasks 

In this section, we propose an optimal algorithm longest chain maximum processor first (LCMPF) for the 
minimization of makespan using simple rules. As shown in algorithm [ 1 ] it works on two criteria, first one is 
length of chain and second is processor occupancy of task. The application which has longest chain means 
maximum number of phases (application length) will first scheduled. If two or more applications or chains are 
of same length then the application which have more number of processor requirement (initial requirement) 


of task will be scheduled first. Figure 4(a) shows an example of uniform chain multiprocessor unit time task 


system. Initial processor requirement of chain Ci, C 2 , C 3 and C 4 is pi, p 2 , Ps and p 4 respectively. Total number 
of phases or tasks in Ci, C 2 , C 3 and C 4 is 4, 3, 5 and 4 respectively. 

Pseudocode for LCMPF is shown in Algorithm [1] In this example, assuming M = 16, pi = 8 , p 2 = 4, pa = 
6 and p 4 = 10, uniform chain task system of multiprocessor unit time tasks is shown in Figure [4(b) [ algorithm 
works as follows: initially at time t = 0 , chains in sorted order based on remaining unscheduled length are 
C 3 , Cl, C 4 and C 2 and number of free processors is 16. So task T 31 will be scheduled first. Now, number of 
remaining free processors is m = 16 — 6 = 10. Now, next longest chains are Ci and C 4 but initial processor 
requirement of C 4 (i.e. p 4 = 10) is greater than initial processor requirement of Ci (i.e. pi =8), so next task 
to be scheduled will be r 4 i. Now, number of remaining free processors is m = 10 — 10 = 0 so no task can be 
scheduled next in this time slot. As there is no free processor so CPU time wastage in current time slot {t = 0) 
is 0 . 

Time complexity of this can be analyzed as follows: Sorting chains requires 0{N logiV) time and finding chain 
which requires maximum initial processors from longest chains of same length takes 0{N) time. For one time 
slot, scheduling will take time 0{N -|- iVlog A) time, so total time complexity of scheduling task system by 
LCMPF algorithm is Cmax-{N + A log A), where Cmax is constant. 


5 




















































































































































































Time Slot 1 


Cl 

8^8 

LCMPF 

Cl 

Cl 

A’ 

■H3 

Best Case 

C2 

8 

C2 ^ 

C2 

■ 

Cl 

Worst Case 

8^8 

Cl 8 

Cl 

ZhH 

C2 

9 

C2 □ 

C2 

■ 


Figure 5: Best and worst case behavior of algorithm A and LCMPF 


Let us assume length of chain have an upper bound L. In this case, instead of sorting we can use L different 
bins to put the application with length I to bin number I where highest bin is L, next highest bin is L — 1 and 
so on. This will take 0{N) time to put all the applications to bins. If we want all applications in each bin to be 
in sorted order based on initial processor requirement, then process of inserting all applications to bins requires 
0{N log N) time. 

Every time slot, we need to take out some applications from the current highest bin {L ) and get inserted 
to next highest bin {L — 1). If we want applications in the bins in sorted order based on initial processor 
requirement then it will take 0{N) time. Let a be the number of applications get executed in a time slot, then 
we need to do maximum a applications to be removed from current highest bin (L ) to next current highest bin 
{L — 1). So time complexity of this operation will be 0(a + /3), where /? is the number of applications already 
in next current highest bin (L — 1). Overall time complexity of algorithm will be 0(iV log A^) + Cmax-{ct + P)- 

Theorem 4.1. Longest chain maximum processor first algorithm always gives optimal makespan time for uni¬ 
form chains with splitable tasks. 

Proof. We always try to use all the processors at all the time slots to make Pwaste minimum. CPU time wastage 
happens when the total number of processor requirement of all the ready tasks in a particular time slot is less 
than M. Optimality of LCMPF is shown as result of proof of Lemma [4.21 and Lemma [4.111 □ 

Lemma 4.2. Selecting tasks from long chain first will not increase the CPU wastage time in future time slot. 

Proof. Selecting long chain task first reduces the chances of free processors in future time slots. Let there be 
an algorithm A which gives the optimal makespan time T and our approach LCMPF is giving makespan time 
T. As from equation (1) lower bound of makespan LB will be same in both algorithms. As A is optimal, we 
can say that 

PwasteiA^) < Pwaste{LCMPF) (3) 

where Pwaste{A ) is average CPU time wastage of algorithm A and Pwaste{LCMPF) is average CPU time 
wastage of algorithm LCMPF. 

If our approach selects task from longest chain (let the length of chain is 1) then the optimal algorithm A 
select task from any chain of length I where I < 1. This will happen each time and there will be 0 or more 
number of tasks of an arbitrary application remaining at last. Assume that two tasks of only application are 
remaining at last then the CPU wastage will increase because we can complete only one task at one time. 

The figure [5] depict the worst case behavior. Assuming 16 processors, in best case LCMPF algorithm selects 
Til first and A selects either Tn or T 21 first. In both cases total CPU time wastage will be 16 — 8 = 8. In 
worst case LCMPF selects Tn first so total CPU time wastage will be 16-(l+8) = 7 only but if A selects T 21 
first then total CPU time wastage will be (16 — 1) + (16 — 8) = 15 + 8 = 23. So in last worst case there will be 
situation in A that two phases of an arbitrary application will remain unscheduled i.e. 

PwasteiA') > Pniaste{LCMPF) (4) 

So by contradiction, we can say that selecting tasks from long chain first reduces the CPU time wastage. □ 

Lemma 4.3. Selecting task from highest number of processors requiring (initial requirement) chain will not 
increase the CPU time wastage of current time slot. 
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Algorithm 2 Longest Chain First (LCF) 

Input: Set of N Application Chains and M Processors. 

1: while All chains are not scheduled completely do 

2: Sort chains in the non-increasing order of remaining unscheduled chain length. 

3: while Processors are free and at least one chain in unvisited do 

4: if only one unvisited chains of same unscheduled length then 

5: Select ready task Tij from unvisited chain with longest unscheduled length. 

6: else Select ready task Tij from any chain (or, FCFS basis). 

7: if Remaining processors m are more than pjjthen 

8: Schedule task Tij on Allocated Processors 

9: else Mark this chain as visited; 

10: Mark all chains as unvisited. Tgch = '^sch + 1 


Proof. Let there be an optimal algorithm A which gives the optimal makespan time then it will choose the 
task from the chain which requires initial processors at least one less than initial processors of task chosen by 
our approach LCMPF. If A is occupying less processors then obviously A is giving more CPU time wastage 
then LCMPF. □ 

Our approach LCMPF gives minimum CPU time wastage so it gives optimal makespan time. 


4.2 Non-splitable multiprocessor tasks 


In this case, our application system consists of N uniform chains of non-splitable multiprocessor unit time 
tasks. Non-splitable multiprocessor task means task can be processed only by giving all the required number 
of processor as a whole at that time. 

In the Algorithm [21 we are proposing an optimal algorithm for the minimization of make-span. This algorithm 
is similar to Algorithm |T] except it does not use the second criteria (i.e. maximum processor occupancy). If we 
use LCMPF then also makespan time will be same for uniform chains having non-splitable multiprocessor unit 
time tasks but time complexity will increase because of second criteria. As soon as we found longest unvisited 
chain we select the next ready task of that chain, no matter if there are other chains of same length with high 
or less number of processor requirement. 

To demonstrate the working of our scheduling approach, let us consider example of same set of applications 


described in Subsection O and it is shown in Figure [4(a)[ Pseudo code for proposed longest chain first (LCF) 


approach is shown in Algorithmic] The algorithm works as follows: initially at time t = 0, chains in sorted 
order based on remaining unscheduled length are C 3 , Ci, C4 and C2 and number of free processors is 16. So 
task T 31 will first scheduled. Now, number of remaining free processors is m = 16 — 6 = 10 and next longest 
unvisited chains are Ci and C 4 . We can schedule ready task from any chain but we will schedule according 
to first come first served basis. It will not affect the makespan time, only scheduling order so next task to be 
scheduled will be Tn. Now, number of remaining free processors is m = 10 — 8 = 2. Next longest unvisited 
chain is C 4 but it has more processor requirement (i.e. p 4 = 10 ) than remaining free processors (i.e. m = 2 ). 
Same problem is with C 2 - Now, all the chains are visited so we can’t schedule any task in current time slot. 
CPU time wastage in current time slot (t = 0) is 2. 

If we assume maximum length of chain may be arbitrary, then complexity will be Cmax times the sorting time 
that is Cmax-{N log N). But if length of chains are bounded by L then instead of sorting we can use L different 
bins to put the application with length 'V to bin number 'V where highest bin is L, next highest bin is L — 1 
and so on. This will take 0{N) time to put all applications to bins. 

Every time slot, we need to take out some applications from the current highest bin (L ) and get inserted to 
next highest bin {L — 1). Let a be the number of applications get executed in a time slot, then we need to do 
maximum a applications to be removed from current highest bin (L ) to next current highest bin [L — 1). So 
time complexity of this operation will be a. Overall time complexity of algorithm will be 0{N) -|- a.Cmax- 


Theorem 4.4. Longest chain first algorithm always gives optimal makespan time for uniform chains with 
non-splitable multiprocessor unit time tasks. 

Proof. Let there be an optimal algorithm A which gives schedule length Cmax = OPT. Suppose the optimal 
algorithm produces a sequence of scheduled multiprocessor tasks at time slot tj, where t = 1 to Cmax- Suppose 
there are N uniform chains of non-splitable multiprocessor tasks namely a —chain., b —chain, c—chain, d—chain 
and etc. with processor requirement a, b, c, d and etc. as shown in Figure [SI The optimal algorithm produces a 
output sequence of scheduled tasks for example l(a, b), 2(a, c), 3(c, d, a), 4(a, d) ., where i{j, k, ..) represents 
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Figure 6: Uniform chains of non-splitable multiprocessor unit time tasks 


a task from j, a task from k and so on are scheduled at time slot t = i. We can see from the scheduled sequence 
that between a nearby pair, scheduled slot may be exchanged. For an example, l(a, 6 ) and 2(a, c) can be 
exchanged without affecting the optimality and precedence constraint to 2(a,b) and l(a,c). So in this way, we 
can sort the scheduling sequence by A to get the required scheduling sequence by LCF. □ 

As optimal sequence can be converted to LCF sequence without changing Cmax, so our LCF is optimal. But 
we can’t guarantee that LCF is the only algorithm which produces optimal result. There may be other algorithm 
that may produce optimal result. 


5 Monotone chains of multiprocessor unit time tasks 


Non-increasing chains of multiprocessor unit time tasks can be scheduled optimally. Our proposed approaches 
LCMPF (Algorithm [T|) and LCF (Algorithm [2|) will produce optimal makespan time for non-increasing chains 
of splitable and non-splitable multiprocessor unit time tasks respectively. An example of non-increasing chains 
of multiprocessor unit time tasks is shown in Figure |2(b)[ Optimality of LCMPF and LCF for non-increasing 
chains of splitable and non-splitable multiprocessor unit time tasks respectively can be proved by Theorem 15.21 
and Theorem [SJ 


Theorem 5.1. LCMPF algorithm schedules non-increasing chains of splitable multiprocessor unit time tasks 
optimally. 

Proof. As shown in Theorem 14.41 Pwaste by longest chain maximum processor first is minimum for uniform 
chains of multiprocessor unit time tasks. As we are choosing maximum processor first if chain lengths are 
same and this satisfy both uniform chains and non-increasing chains. So it is obvious that choosing maximum 
processor task of a chain will choose the task with highest processor requirement of the chain which is the ready 
task of the chain. As the subsequent tasks of chain require less number of processor then the Pwaste{LCMPF) 
of current slot will be minimum and also maintains the Cmax to be optimal. □ 


Theorem 5.2. LCF algorithm schedules non-increasing chains of non-splitable multiprocessor unit time tasks 
optimally. 

Proof. As shown in Lemma 14.21 scheduled sequence given by an optimal algorithm A can be converted into 
scheduled sequence given by LCF without affecting optimality of makespan in case of uniform chains. Same 
can be done in case of non-increasing chain without affecting the optimality. □ 


An example of non-decreasing chains of multiprocessor unit time tasks is shown in Figure 2(c) This type 


of chain can also be scheduled for both types (splitable and non-spilitable) of task same as uniform chains. 
LCMPF and LCF will give optimal make-span time for non-decreasing chains of splitable and non-splitable 
multiprocessor tasks respectively. Optimality of LCMPF and LCF for non-decreasing chains of splitable and 
non-splitable multiprocessor unit time tasks respectively can be proved by Theorem 15.51 and Theorem 15.41 


Theorem 5.3. LCMPF algorithm schedules non-decreasing chains of splitable multiprocessor unit time tasks 
optimally. 


Proof. Non-decreasing chains of multiprocessor unit time tasks can be formed by reversing the order of non¬ 
increasing chain. Reversing the chain order does not increase the chain length and it can be scheduled optimally 
by LCMPF. Non-decreasing chains are just opposite chains of non-increasing chains. Let the schedule order D 
we get is oi, 02 , 03 and 04 of non-decreasing chains. If we reverse the chains and schedule them as non-increasing 
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Figure 7: Criticality of tasks of chain 


chains then the order of tasks for scheduling we will get the opposite of D, and that is 04 , 03 , 02 and ai. As 
schedule for non-increasing chains is optimal, we can say that schedule for non-decreasing is also optimal. □ 

Theorem 5.4. LCF algorithm schedules non-decreasing chains of non-splitable multiprocessor unit time tasks 
optimally. 

Proof. As shown in Theorem l5.31 non-decreasing chain can be converted into non-increasing chains by reversing 
order. LCF gives optimal solution for non-increasing chains so we can say schedule by LCF for non-decreasing 
chains of non-splitable multiprocessor unit time tasks will also be optimal. 

□ 


6 Arbitrary Chains of multiprocessor unit time tasks 

6.1 Non-splitable tasks 

In this we are considering system of application of non-splitable multiprocessor unit time tasks with arbitrary 
number of processor requirement in each phase of a chain. Unlike the processor requirement of multiprocessor 
task of uniform chains or monotone chains where are same or in non-increasing (or non-decreasing) order. Figure 
|2(d)| shows an example of arbitrary chains of multiprocessor unit time tasks. As the tasks are multiprocessor 
unit time tasks and non-splitable in terms of processor, this problem is proved to be a hard problem [?]. 

We have used three heuristics for this problem and compared the performance. The proposed heuristics are (a) 
maximum criticality first (MCF), (b) longest chain maximum criticality first (LCMCF), and (c) longest chain 
maximum processor first (LCMPF). Longest chain maximum processor first heuristic is described in Section 

sm 

As dehnedin [la, we level each multiprocessor task of chain with its criticality value, which is sum of processor 
requirement of self and all of its successors. The criticality of task Ty is calculated as 

k — TLi 

= Y, p^k (5) 

k=j 

where CVij is the criticality of task of chain, pik is processor requirement of task Tik and Ui is number 
of phases or tasks of chain. Figure [7] shows calculated criticality value of tasks of a example chain and now 
every task have one more parameter that is CVij ■ 

Pseudocode for MCF heuristic is shown in Algorithm [31 Initially, it calculates the criticality value (using 
equation 5) of tasks of all applications. In every scheduling step, it try to select subset of tasks from set of 
ready tasks to schedule that gives the maximum criticality value with total number of processors in the subset 
of ready tasks is less then M. If ready task of chain is Tir and its criticality value is CVir and processor 
requirement is pir then we need to select a subset S from the ready tasks so that the following objective meets 

max{ 'Yj CVir) with ^ pir < M (6) 

Tir&S TirGS 

This is similar to solving 0-1 knapsack problem. 

Pseudo code for LCMCF is shown in Algorithm [d] It is similar to LCMPF except the difference between 
second criteria of selection in choosing ready tasks. LCMPF chooses the ready task with maximum processor 
while LCMCF chooses the task with maximum criticality. 

Figure [5] shows the set of 5 arbitrary applications with non-splitable multiprocessor unit time tasks. Second 
parameter in each task of each chain is the criticality of that task calculated by equation [3] Assuming 20 
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Algorithm 3 Maximum criticality first scheduling (MCF) 

Input: Set of N Application or Chains and M Processors. 

Output: Schedule for application’s tasks. 

1: Calculate criticality for each task of each application. 

2: Initialize ready queue with the first task of all applications 
3: while All chains are not scheduled or ready queue is not empty do 

4: Select subset from ready tasks that gives maximum criticality and which has total processor requirement less than or equal 

to M. 

5: Update ready queue with the ready tasks(i.e. whose previous tasks have been scheduled). 


Algorithm 4 Longest Chain Maximum Criticality First (LCMCF) 

Input: Set of N Application or Chains and M Processors. 

Output: Schedule for application’s tasks. 

1: Calculate criticality for each task of each application. 

2: while All chains are not scheduled do 

3: Sort chains in the decreasing order of remaining unscheduled chain length. 

4: while Processors are free and at least one chain is unvisited do 

5: if There are two or more unvisited chains of same length then 

6: Select ready task Tij which has maximum criticality among all ready tasks of unvisited longest chains . 

7: if There are two or more tasks of same criticality then 

8: Select any task. 

9: else 

10: Select ready task Tij from longest and unvisited chain. 

11: if Remaining processors m are more than pij then 

12: Schedule task Tij on allocated Processors 

13: else 

14: Mark this chain as visited 

15: Mark all chains as unvisited 


processors, lower bound (LB) for the example task system will be ~ 8. Scheduled outputs of all three proposed 
heuristics for set of applications (Figure [HI) are shown in Figure IfTT] in tabular form. For this set of applications, 
all three heuristics give same makespan time Cmax = 9 and total CPU time wastage equals to 21 (and Pwaste = 
21/20 = 1.05 ~ 1) but different set of tasks are scheduled in a time slot. 

Among these three heuristics, no one gives best result for all the case. One heuristic may perform better as 
compared to other two in some specific case (lower the Cmax value, better the result). As shown in Figure [TUI 
performance of LCMCF and LCMPF is better compare to MCF for application set 1 (shown in Figure fTUl aii. 
performance of MCF and LCMPF is better than LCMCF for applications set 2 (shown in figure fTOl bU and 
performance of MCF and LCMCF is better than LCMPF for application set 3 (shown in Figure [TUl cU. 

Figures [TTk and ITTb show the performance of MCF, LCMCF and LCMPF scheduling of 100 and 1000 ap¬ 
plications respectively with different number of phases on 64 processor system. Figures [TTb and II 11 1 show the 
performance of same scheduling approaches of same application set on 512 processors. We observed that: 

1. For a fixed number of processors, when number of applications increase, the difference between LB and 
makespan time decreases i.e. efficiency in time increases. 

2. For fixed number of processors and applications, when upper limit of number of phases an application can 
have increase efficiency in time increases for all three heuristics. 

3. Avgerage ratio of Cmax and LB will always be better in LCMPF. 

Again Figure [12] shows the performance of MCF, LCMCF and LCMPF scheduling of application system 
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Figure 8: Chain of non-splitable multiprocessor tasks with criticality values 
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MCF output 

LCMCF output 

LCMPF output 

At time t = 0 

Tiij T 31 , T 41 
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At time t = 1 
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721, T32 

At time t = 2 
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At time t = 3 
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At time t = 4 
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At time t = 5 
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Figure 9: Scheduling results of all three heuristics of task system shown in Figure H] on 20 processors 



(a) Cmax Lmcf=9, 
Llcmcf=^ and 

Llcmpf=8) 


(b) Cmax Lmcf=9, 
Llcmcf=^0 and 

Llcmpf=9 


(c) Cmax Lmcf=7, 
Llcmcf=7 and 

Llcmpf=8 


Figure 10: Performance of MCF, LCMCF and LCMPF on contradictory examples 


with 10% and 50% variation of number of phases of applications with one another for 500 applications on 100 
processors. Our observation says that 

1. For a fixed number of processors and applications, when phase variation, the difference between LB and 
makespan time varies i.e. efficiency in time increases or decreases. 

2. For fixed number of processors, applications and phase variation, when number of phases an application 
can have increase efficiency in time increases for all three heuristics. 

Time complexity of MCF will be the complexity of 0-1 knapsack solution at each time slot. As problem of 
0-1 knapsack can be solved in pseudo-polynomial time with respect to capacity of knapsack and total number 
of item, so in our case it will be 0{NM), so total complexity of MCF is Cmax-NM. As described in Section 4, 
complexity of LCMCF will be same as complexity of LCMPF. 

Overall LCMPF will give better results than MCF and LCMCF for any set of applications having any number 
of phases and any number of applications. 

6.2 Splitable tasks 

Scheduling of arbitrary chains of splitable multiprocessor unit time tasks is an interesting problem. As of 
our knowledge, no one have found polynomial time solution and also no one has proved that this problem 
is NP-Complete. Using processor as continuous medium which behaves like electrical charge passing from 
task to task in the DAG (instead of chain), author of paper [^, solve this in iterative ways with complexity 
is 0{e^ + ne + I{n -I- e)), where e is the number of edges in the precedence graph and I is the number of 
iterations in the algorithm. They use optimality conditions impose by a set of nonlinear equations on the 
flow of processing power (processors) and on the completion times of independent paths of execution which is 
analogous to Kirchhoff’s laws of electrical circuit theory. But our main aim is to solve in using discrete approach. 

We have also used the same MCF, LCMCF and LCMPF heuristics to schedule this kind of application on 
to multicore. As multiprocessor task are splitable, we use fractional knapsack in MCF heuristic. Experiment 
shows all three heuristics produce exactly same result for randomly generated examples. We also observe that 
all three heuristics perform equally if sum of processor requirement of all the ready multiprocessor tasks is > M 
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(a) M = 64, Apps = 100 


(b) M = 64, Apps = 1000 
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(c) M = 512, Apps = 100 


(d) M = 512, Apps = 1000 


Figure 11: Comparison between MCF, LCMCF and LCMPF 


in all schedule time slot except the last time slot. If this condition is violated then LCMPF performed better 
than other two heuristics. 


7 Conclusion and Future work 

Scheduling with considering the phase behavior improve the system performance. Our proposed approach 
LCMPF scheduling of uniform and monotone chain of multiprocessor unit time task is proved to optimal. If 
the multiprocessor task are non-splitable, then LCF approach is optimal, we don’t need to consider processor 
occupancy criteria of multiprocessor task. 

Scheduling arbitrary chain of multiprocessor unit time task is in NP-complete. In this case our proposed 
LCMPF based heuristics perform better as compared to MCF and LCMCF heuristics. We believe that schedul¬ 
ing of arbitrary chain of splitable multiprocessor unit time task is still an open problem. We have also compared 
performance of proposed LCMPF and other MCF and LCMCF heuristics for scheduling this kind of task. In 
future, we are planning to try to solve scheduling of arbitrary chain of splitable multiprocessor unit time task. 
Also solve the same with other restrictive precedence constraints and or with some communication model. 
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